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ABSTRACT 

In  this  paper  we  a  representing  a  power  efficient  high  throughput  data  path  architectures.  There  are  few 
previously  proposed  architectures  which  will  also  give  the  same  results  but  we  have  modified  such  that  it  provides  higher 
power  efficiency  and  large  area.  In  this  paper,  we  have  shown  two  different  data  path  arrangements  for  signed  and 
unsigned  operations.  The  expected  outcome  is  to  find  the  feasible  design  that  will  reduce  the  complexity  and  improve  the 
performance.  The  circuits  have  been  systematically  designed  to  reduce  power  consumption. 
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INTRODUCTION 

The  data  path  elements  are  nothing  but  simple  functional  blocks  that  are  present  within  a  microprocessor  that  can 
interact  within  and  perform  computational  operations  (ALU). Basic  tasks  like  reading/ writing  to  memory,  numerical  shift 
operations  arithmetic,  logic  operations,  numerical  shift  operations  and  arithmetic,  logic  operations.  It  consists  of  register 
banks  to  store  the  values  (inputs,  outputs  etc),  control  signals  address  bus,  etc  they  are  all  interconnected  within  the  block. 
In  this  paper,  the  data  path  is  deduced  for  the  two  architectures  and  compared  with  previously  proposed  architectures.  The 
functionality  and  performance  of  the  given  architecture  to  the  proposed  one  is  compared  and  the  results  have  been 
tabulated  show  the  advancement  of  the  present  architecture  to  the  previous  ones.  All  of  these  evaluations  are  done  using 
Verilog  coding  using  Model  sim  and  Quartus  Altera. 

OVERVIEW 

All  the  data  paths  have  been  deduced  using  Verilog  coding  to  check  their  function.  The  execution  and  evaluation 
of  the  architectures  are  done  using  Quartus  Altera.  Evaluation  of  their  performance  can  be  done  in  terms  of  computation 
time,  operating  frequency  and  power  consumption.  We  have  evaluated  the  performance  and  complexity  of  the  two 
architectures.  Our  main  objective  is  to  come  up  with  a  dataflow  path  which  when  compared  to  the  previously  proposed 
architecture  has  Less  complexity,  High  power  efficiency,  High  throughput,  and  speed. 
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LITERATURE  SURVEY 

The  datapath  for  the  architecture  given  isdeduced  accordingly.  Following  which  verilog  coding  is  written  for 
the  same  and  desired  output  is  obtained. 


Architecture  1 


Figure  1 

Figure,  l.is  the  first  architecture  for  which  the  code  is  written  and  simulation  and  evaluation  is  done.  After  the 
evaluation  the  desired  results  are  obtained. 

The  above  architecture  contains  2  (4x2)  multipliers,  6  multiplexers, 3  adders.  This  is  a  three  stage  architecture. 
The  multiplier  stage  consists  of  2  (4x2)  multipliers.  The  second  stage  is  the  muxing  stage  where  here  are  total  6 
multiplexers  in  which  there  are  4  2:1  muxes  and  2  4:1  mux.  Then  the  last  stage  is  adder  stage.  There  are  two  2  3  bit  full 
adders  and  one  4  bit  full  adders  in  this  stage.  The  present  output  of  this  stage  is  fed  as  the  next  input  of  the  mux  stage.  The 
main  advantage  of  this  architecture  is  the  operands  directly  avoid  the  two  multipliers  in  most  cases,  that  except  for 
multiplication  operations.  The  main  disadvantage  of  this  circuit  is  that  only  unsigned  binary  operands  can  be  handled  by 
this  circuit.  No  signed  operations  can  be  done  by  this  circuit. 
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Architecture  2 
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Figure[2]  shows  the  secondly  proposed  architecture.  This  data  path  supports  both  signed  and  unsigned  operations.  It 
consists  of  six  respective  stages.  The  first  stage  of  this  architecture  is  the  muxing  stage.  It  consists  of  4  2:1  multiplexer. 
Following  which  the  second  stage  consists  of  multipliers  and  muxes.  There  are  two  (4x2)  multipliers  and  two  2:1 
multiplexer  in  this  stage.  There  are  3  2:1  multiplexer  after  that.  An  output  from  the  multiplier  stage  and  mux  stages  are  later 
fed  to  compressors.  There  are  three  3:2  compressors  used  in  this  stage.  We  know  compressors  are  basically  full  adders.  The 
outputs  of  this  stage  are  taken  as  the  input  of  the  adder  stage.  The  adder  stage  consists  of  three  full  adders  among  which 
two  are  3  -bit  full  adders  and  one  is  4-bit  full  adder.  The  carry  out  can  be  taken  from  the  out  put  of  the  last  adder  or  Adder 
3.  The  sum  outs  are  taken  to  the  next  stage  that  is  registered  stage.  It  consists  of  three  registers  among  which  two  are  3 -bit 
registers  and  one  is  4-bit  register.  From  the  register  stage  itself,  we  will  be  getting  the  various  outputs.  Also,  the  present 
outputs  of  this  stage  are  taken  as  the  future  input  of  the  multiplexer  stage.  The  main  advantage  of  this  architecture  is  we 
can  perform  both  the  signed  and  unsigned  operations  in  this  architecture. 

METHODOLOGY 

While  comparing  with  previously  proposed  architectures  that  have  been  shown  below.  We  have  modified  the 
given  architecture  in  order  to  obtain  better  results.  Given  below  are  two  such  architectures  that  have  been  previously 
proposed. 
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Figure[5]  is  the  first  architecture  that  was  previously  proposed  which  could  perform  Only  unsigned  operations.it 
consists  of  4  stages.  One  of  the  drawbacks  of  this  architecture  is  its  complexity,  more  power  consumption  and  is  not 
efficient  in  terms  of  the  area  and  also  it  is  not  efficient  to  perform  floating  point  operations  which  are  used  frequently  in 
multimedia  devices  ALU.  Here,  the  multiplication  operation  takes  two  clock  cycles  and  requires  the  use  of  the  entire 
datapath,  as  the  operands  move  through  all  the  multipliers  and  adders,  thus  leading  to  more  computation  time. 


Fig[6]  is  the  second  architecture  that  was  previously  proposed.  It  accepts  8-bit  unsigned  operands.  This  datapath 
consists  of  two  4X4  Wallace  tree  multipliers  and  four  4-bit  ripples  carry  adders.  Due  to  the  use  of  4X4  multipliers,  each 
of  the  datapaths  requires  2  clock  cycles  to  perform  8X8  multiplication  operation.  Another  limitation  of  these 
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datapath  architectures  isits  inability  to  perform  signed  arithmetic  operations!!].  Thus  making  it  inefficient  in  terms  of 
area  and  computation  time. 

V.  RESULTS 


Table  1 


Architectures 

Thermal  Power  Dissipation  (mW) 

Clock  Frequency  (MHz) 

I 

136.2 

320.43 

II 

137.14 

344.83 

Table  2 


Data  Delay  (ns) 

Area  (Total  Registers) 

Operating  Frequency  (MHz) 

Area  (Total  Logic  Elements) 

2.64 

8 

408.27 

36 

2.233 

8 

431.9 

52 

SIMULATION  RESULTS 
Architecture  1 


Figure  5 


Architecture  2 


Figure  6 
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CONCLUSIONS 

The  synthesis  of  the  above  architectures  is  yet  to  be  done  on  45nm  technology. 

We  have  evaluated  the  above  results  in  Quartus  only  but  that  is  not  efficient  and  perfect.  So  we  need  to  do 
the  synthesis  of  these  architectures  to  get  the  perfect  results.  But  from  the  above  results,  one  thing  we  can  say  that  the 
first  architectureis  doing  only  unsigned  operations  whereas  the  second  architecture  is  able  to  perform  both. 

So  it  will  consume  a  bit  of  more  power.  Also  the  second  architecture  takes  a  bit  more  area  in  terms  of  logical 
elements  and  registers.  But  the  second  architecture  is  producing  very  high  throughput.  And  because  it  can  perform  signed 
operations,  it  is  more  efficient  for  floating  point  arithmetic  operations  which  are  used  frequently  in  multimedia  devices. 
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